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Abstract 

A particle-swarm is a set of indivisible processing elements that traverse a network in order 
to perform a distributed function. This paper will describe a particular implementation of a 
particle-swarm that can simulate the behavior of the popular PageRank algorithm in both 
its global-rank and relative-rank incarnations. PageRank is compared against the particle- 
swarm method on artificially generated scale-free networks of 1,000 nodes constructed 
using a common gamma value, 7 = 2.5. The running time of the particle-swarm algorithm 
is 0(|P| + \P\t) where \P\ is the size of the particle population and t is the number of 
particle propagation iterations. The particle-swarm method is shown to be useful due to its 
ease of extension and running time. 
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1 Introduction 



Influence, prestige, impact, and authority refer to a class of network metrics that uti- 
lize the structure of a graph, G = {N, E}, to derive an influence ranking, I E W N ', 
over all its constituent nodes. Generally these metrics determine a node's impor- 
tance in a recursive fashion. A node's influence, Ik, is a function of the influence 
of the nodes that project to it. This idea is represented in Eq. (1), where ej^ is a 
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directed edge from rij to n^, out(nj-) is the set of outgoing edges from node rij, and 
t is the current iteration represented in discrete time. The collection of influences 
across all nodes in the network is represented by the vector I which, upon conver- 
gence, is the principle eigenvector of the adjacency matrix formed by the graph [1]. 
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Since the inception of these algorithms there has been a strong focus on global- 
rank, I(n\N) or simply I in), and only recently has there been research interest in 
relative-rank I(n\R), where R C N [2]. Global-rank determines the relative influ- 
ence of each node with respect to the entire node population, N, while on the other 
hand, relative-rank determines the relative influence of each node with respect to a 
particular subset of the network, R C N. Global-rank algorithms have found them- 
selves at the forefront of web search techniques: PageRank [1], HITS [3], and their 
respective extensions. Biased, or relative ranking has found application in domain- 
specific authority using web-page networks [4], company-specific idea influence 
using collaboration networks [2], and manuscript-specific peer-review influence 
using co-authorship networks [5]. It is important to note that global-rank can be 
interpreted as a special case of relative-rank where each node's influence is calcu- 
lated relative to a root node set that is the entire node population, R = N. 



The contribution set forth by this paper is two fold. First, this paper demonstrates 
the application of particle-swarms to the calculation of these two popular influence 
metrics: PageRank (global-rank) [1] and PageRank-Priors (relative-rank) [2]. The 
particle- swarm algorithm is useful because of its running time and flexibility. Un- 
like most popular implementations, a particle- swarm has a more tangible appeal 
that lends itself towards various functional modifications. This paper will only pro- 
vide the rudimentary data structures and functions necessary to simulate PageRank 
and PageRank-Priors, but the framework will provide room for possible extensions. 
The second contribution of this paper is that it provides an introduction to the use 
of particle-swarms in the broader context of graph analysis and manipulation. Cur- 
rently there is little research in this area. Of those manuscripts found, most of them 
analyze graphs from the perspective of a single random-walker and do no include 
more advanced functions and properties such as particle energy, decay, and telepor- 
tation [6-9]. 



The outline of the paper is as follows. Section 2 will discuss both PageRank and 
PageRank-Priors from the standpoint of an object-oriented random-walker model. 
Section 3 will then describe the graph theoretic model of the particle- swarm method 
with emphasis on the various parameters and functions of the particles as they apply 
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to simulating PageRank and PageRank- Priors. Section 4 compares both PageRank 
algorithms and the particle- swarm algorithm on artificially generated scale-free net- 
works. Finally, Section 5 discusses the running-time of the particle- swarm method 
and two optimizations. The paper concludes, Section 6, with a short discussion of 
related PageRank algorithm implementations. 



2 Random-Walker Model 

Both PageRank [1] and PageRank- Priors [2] can be described in a random- walk 
fashion where a stochastic token, or particle, moves throughout a network, G. The 
rank influence of any node n k G N is the probability that the particle-token, p, will 
be seen at that node, h = P(p\rik). This conceptual analogy is explicitly repre- 
sented within the object-oriented framework of this paper as a swarm of particle- 
tokens, P, that traverse the network landscape depositing their energy footprint on 
each node they traverse. In doing so, the particles generate an influence ranking of 
the nodes in terms of the normalized energy distribution, /, of the node population. 

2.1 PageRank Walker 

The PageRank algorithm, as described in [1], was the driving force which has car- 
ried the Google search engine to the forefront of web search-engine technology. 
Simply speaking, the algorithm is calculated in a recursive fashion where a partic- 
ular page in a network of web-pages is influential if it is referenced by, or linked 
from, other influential pages. Imagine a random- walker, p, traversing a network 
of web-pages such as the World Wide Web, G = {N, E}. If that random- walker 
continuously finds itself at a particular page n, then that random-walker is said to 
have a high probability of being at that web page. This probability is interpreted 
as the page's, or node's, influence. The random-walker is consistently located at 
that web-page because the incoming edges to nu, in(rifc) C E, are either numerous, 
nearing the limit |in(n fc )| « \E\, or the nodes that point to n k have a numerous 
set of incoming edges which allow the random-walker to consistently reappear at 
n k . Taken to its recursive limit, a node's influence is a measure of all the aggregate 
influence it receives from pages pointing to it whether direct or indirect. 

A dampening-factor, A G [0,1], can be introduced to reduce the spread of influ- 
ence over time [10]. The further the random-walker travels, the less influence the 
random-walker should have, such that at full dampening, A = 1.0, the random- 
walker can not take a step and all nodes are ranked equivalent, Ik,t=o = jm- The 
combination of random-walker propagation and dampening is expressed in Eq. (2). 
The first block of the equation represents the proportion of influence distributed 
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to 7ik by rij. This can also be interpreted as the probability of the random walker 
taking the edge given the condition that its current location is rij. The second 
block of the equation provides the equal distribution of influence incurred through 
dampening. Notice that A serves as the scaling variable modulating the influence of 
each block on the influence vector, /. 
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2.2 PageRank-Priors Walker 



The priors idea was first proposed by [2] in their formalization of a relative-rank 
extension to both PageRank and HITS. Suppose the network data structure, G = 
{N, E}, is supplied with a root node set, R C N. This root set is the set of nodes 
used to rank all other nodes relative too. Suppose that at each time step, the random- 
walker has a probability, (3, of 'teleporting' to particular node r E R as defined by 
the probability distribution, p(r) = r^r. This means that if the random walker de- 
cides to teleport home, which is dependent on the probability j3, then the random 
walker chooses a random node in the root node set, R. A variation to the algorithm 
can bias the probability distribution over R. 



As (3 approaches 1.0 the probability of seeing the random- walker at any node in 
R becomes greater and therefore the influence of the nodes in R, as well as those 
nodes that R projects to, increases. At the limit when /? = 1.0, the influence distri- 
bution of all n £ R = 0.0 and the influence of all n E R = r^r. In this way, the 
random-walker is biasing the ranking of the network nodes, N, towards the subset 
R. When j3 = 0.0 there is still a bias towards the root node set since the random- 
walker will initiate its walk from that set, but the probability of the random- walker's 
location diffuses over the network as the amount of iterations increases. 



The next section will extend the random-walker model to a particle-swarm model 
where a collection of random- walkers, P, traverse the network depositing an energy 
footprint at each step of the way. These energy footprints, as stored in the node's 
'memory', I k E I, represent the probability of having a particle at that particular 
node. It is important to note that the random-walker model can be easily extended 
to account for weighted graphs, G = {N, E, W}, where the outgoing edges of 
a node are normalized to create a probability distribution. This probability distri- 
bution biases the random-walkers decision when taking an outgoing edge and in 
such cases is called a biased random- walker. In this way, weighted PageRank and 
weighted PageRank-Priors can be calculated. The next section will discuss the full 



4 



weighted model of the particle- swarm framework though the simulations are only 
for the PageRank and PageRank-Priors non- weighted counterparts. 



3 Particle-Swarm Model 



A particle-swarm, P, is a collection of unique processing entities that, by traversing 
a network in a stochastic manner, collectively perform a distributed function. In re- 
lation to the random- walker model, a particle- swarm is simply a collection of many 
random-walkers. The unification of the network particles, nodes, roots, edges, and 
weights form the data structure G = {P, N, R, E, W} where each edge is assigned 
a weight, \E\ = \W\, and R C N. A single particle can contain any number 
of properties and behaviors, but for the purposes of this paper only those prop- 
erties and behaviors that apply to PageRank and PageRank-Priors are described, 
P = {e, 5, h, P,c}. A particle is an indivisible entity, but its local energy content, 
6i £ [0, 1], is not. Each time a particle traverses an edge, its local energy content is 
affected by a decay-scalar, Si £ [0, 1], which is related to the dampening factor, A, 
described previous. To simulate PageRank-Priors a particle must have a reference to 
its originating, or root node, hi E R, so that it can 'teleport' home as determined by 
a back-probability, /3j £ [0, 1] and a back selection function B((3i) £ {0, 1}. Finally, 
a particle traverses an outgoing edge from its current node location, q £ N, accord- 
ing to an edge selection function, 0(out(cj)), which returns an edge e-ij £ out(cj). 
These properties and functions are enumerated below for ease of reference. Note 
that in order to simulate PageRank and PageRank-Priors, S and j3 are the same for 
every particle in the simulations to follow, Vjj : Si = Si and $ = An obvious 
extension to this framework is to assign unique S and (3 values to different particles. 

(1) e: a local energy value e £ [0, 1] 

(2) S: a energy decay-scalar 5 £ [0, 1] 

(3) h: a reference to its home, or root, node h £ R 

(4) (3: a back-probability (3 £ [0, 1] 

(5) c: a reference to the current node location c E N 

(6) a probabilistic back selection function B(j3) E {0, 1} 

(7) a probabilistic outgoing edge selection function 0(out (c) ) returns ey £ out(c) 

A network node, nk, is represented by the triplet {P{rik), out (n^), h}, where P(rik) 
is a unique set of particles located at nk, out(nfc) is a unique set of outgoing edges 
from rik, and Ik E M is nfc's local energy value. Any edge in the network, e^j, is a 
directed edge, from to rij, with an associated weight, Wkj E [0, 1]. The weights 
of the set of all outgoing edges from any node, out(nfc), must be normalized to 
create a probability distribution for each particle's propagation function (Eq. 3). 
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Initially, a set of nodes in the network are seeded with a collection of particles, P. 
The initial particle distribution, P, can be an equal distribution or a biased distri- 
bution depending on the desired functional output. For global-rank metrics, each 
node in the network is provided with an equal initial distribution, \P(n k )\ = j^j, 
while for relative-rank methods, only an initial root set, R C N, will be provided 
with particles, \P(r k )\ = j^j where r k G R. 

At each time step of the algorithm, a particle performs three behaviors. First, the 
particle increments its current node's energy content, I k , with its current energy 
content, e^, by way of I k ,t+i = h,t + Qrt) (Alg. 31-16). Next, the particle decays its 
energy content by the parameterized decay-scalar, Si (Eq. 4, Alg. 31-17). 

e i(t+l) — e i{t) — (8i e i(t)) (4) 

Lastly, the particle calculates B((3) (Alg. 31-18). If the function returns 1, then the 
particle will return home, Q( t+1 ) = hi. If the function returns 0, then the particle 
chooses an outgoing edge of its local node, e^- G 0(out(cj)) (Alg. 31-26). The out- 
going edge chosen, e it j, determines the particles new nodal reference, Cj( t+1 ) = rij. 
A particle's death occurs when = 0.0. Since the decay function of the particle 
is based on the percentage of its current energy content, formally the particle en- 
ergy will approach, but never reach 0.0. Therefore, a threshold for particle death is 
given when 6j < For the purposes of these simulations an arbitrarily low $ was 
chosen to be 10 -8 . Unlike the 'random teleport' functionality of most PageRank 
implementations, if node q does not have an outgoing edge, then the particle is 
destroyed, q = § (Alg. 31-22). Once all the particles in the network have died or 
a desired t has been reached the particle propagation algorithm is complete. The 
energy content, I k , of all nodes can be normalized to yield the proportion of energy 
every node has with respect to one another. This proportion can be interpreted as 
the probability of seeing a random-walker at that particular node. The aggregated 
values of all energy in the network forms the influence vector J. 

The particle- swarm framework encapsulates both aspects of PageRank and PageRank- 
Priors while allowing for both implementations to be run in their original form. For 
example, to simulate PageRank, j3 — 0.0, S G [0, 1]. To simulate PageRank- Priors, 
j3 G [0, 1], 5 = 0.0. A benefit of this framework is that hybrid algorithms can be 
implemented by combining back-probability, (3, and energy decay, 8, in the same 



simulation. 



The pseudocode for the particle- swarm implementation of PageRank is provided 
in Alg. (31). The first functional block expresses a particle-distribution algorithm 
and the second block expresses the particle-propagation algorithm. To implement 
PageRank-Priors the loop on line 3 should run through R not N and a desired j3 
should be set at line 6. An overview of the different Big-0 running times of the two 
functions are presented in their respective comments and will be examined more 
closely in the Section 5. 



1 


♦ distribute particles: 0{ \N \particles Per Node) = 0( \P ); 


2 


int i = 


0; 




3 


foreach (n& <E N) do 


4 




int particlesPerN ode = 10; 


S 




for (I = 


= 0, I < particlesPerN ode, 1++) do 


6 








= 1.0; <5; = 0.15; hi = n k ; = 0.0; q = n k ; 


7 








8 




end 




9 


end 






10 


♦ disseminate particles: 0(\P\t); 


11 


int t = 


0; 




12 


while (t < pagelterations) do 


13 




t++; 




14 




for (i - 


= 0, i < \P\, i++) do 


15 






if( 


ei > i?) then 


16 










17 








€i = ei - (Sj * ei); 


18 








if {B(Pi) == 1) then 


19 








Ci — Hi , 


20 








end 


21 








else 


22 








if (|0(out(ci))| == 0) then 


23 








| ei = $ 


24 








end 


25 








else 


26 








Ci = 0(out(cj)); 


27 








end 


28 








end 


29 






end 


30 




end 




31 


end 







Algorithm 1: Particle-Swarm implementation of PageRank 
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The next section will provide simulation results of the aforementioned particle- 
swarm algorithm, with varying parameters. The results of these simulations are 
compared to the results given by PageRank, PageRank-Priors, and In-Degree. 



4 Simulation Correlations 

This algorithm test suite was originally run on random networks and scale-free net- 
works of a varying 7 G [2.0,3.0] and size \N\ G [100, 10000] with insignificant 
variation on the particle-swarm's simulation performance. Since the network size 
and topology are not dimensions for analysis, only a collection of scale-free net- 
works of 7 = 2.5 and |iV| = 1000 are used for the remainder of the paper. For 
scale-free construction, each node is given a predetermined set size for their in- 
coming connections as defined by Eq. (5), where the random number ip G [0, 1], 
|i n (n fc )| < \N\ — 1, and in(n fc ) is the set of incoming edges to n k [11]. 



From here nodes randomly connect to one another until their maximum incoming 
connectivity is reached, at which point the network construction algorithm is com- 
plete. By predetermining the maximum incoming connectivity of a node in this 
way, the topology of the network maintains a small portion of node hubs and a rel- 
atively large portion of sparsely connected nodes which is characteristic of many 
naturally occurring networks [12]. 

4.1 In-Degree as a Trivial Case of PageRank and Particle-Swarm 

The trivial case of the random-walker model is when the random-walker is only 
allowed to take one step. This is a method for calculating the influence of a node 
with respects to In-Degree and is an extreme case of PageRank as A — > 1.0 and 
5 — > 1.0 or the algorithm is halted at t = 1. To simulate In-Degree, each edge 
in the network must be traversed at t = 1. To accomplish this, every node is sup- 
plied with a collection of random-walkers proportional to its outgoing edge size, 
\P(rij)\ = a I out (n^) I where a G N + . Now if each random- walker has an equal 
probability of taking any outgoing edge, then at t — 1 the distribution of random- 
walkers across the set of nodes N is the In-Degree influence of that node (Eq. 6). 



m(n k )\= ^,-[i-°/(7-i.o)]j 



(5) 



E 




(6) 
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Since the set of all = in(n fc ), then when substituting \P(rij) | for a | out (rij) | Eq. 
(6) can be represented as Ik = a|in(nfc)|. This equation produces an influence cal- 
culation perfectly correlated to In-Degree. Given that this is a probabilistic particle, 
stochastic noise will disrupt the probability that each outgoing edge of every node 
is taken once and only once. As the size of the initial distribution of particles in- 
creases, as a increases, the noise is reduced and the appropriate In-Degree influence 
vector is returned. If the distribution of random-walkers is equal, \P(nk)\ = j^j, 
then only an approximation of In-Degree can occur. In such cases, the more uniform 
the distribution of outgoing edges of all the nodes, the more accurate the approxi- 
mation. 



Now that In-Degree has been described as a trivial case of PageRank and the 
particle- swarm method, both metrics will now approximate the In-Degree influ- 
ence vector, I in . To simulate In-Degree influence using PagePank, A was scaled 
between 0.005 and 0.995 to produce the correlation plot, (Fig. la). The reason for 
limiting the experiment to A = 0.995 is because when A = 1.0 there is no devi- 
ation in the rank vector, I k = r^r. It is shown that PageRank best approximates 
In-Degree at the limit as A — > 1.0, C = 0.998. Next, the particle-swarm method for 
simulating In-Degree was determined using various initial particle distribution sizes 
of |P(n fc )| e [1, 20], \P\ e [1000, 20000], and (3 = 0.0. The 5 of each particle was 
scaled from 0.005 to 0.995 and as 5 — > 1.0, In-Degree influence is approximated 
most closely, C = 0.997 (Fig. lb). Figure lb is composed of 20 superimposed par- 
ticle distribution size plots. Note that the divergent plot in Figure lb occurs when 
\P(n k )\ = 1, \P\ = 1000. The following influence vector relationship exists be- 
tween these three algorithms: J IN « Ia-»i.o ~ -^<5-»i.o- Notice that PageRank and the 
particle-swarm method are nearly equivalent in their behavior for their respective 
5 = A values, I\ w Is when \P{nk)\ > 1. 




0.005 0.010 0.020 0.050 0.100 0.200 0.500 1.000 0.005 0.010 0.020 0.050 100 0.200 0.500 1 000 

dampening-factor decay-scalar 



Fig. 1. a. PR vs. IN over A G [0.005, 0.995] b. PS vs. IN over <5 G [0.005, 0.995] 



9 



4.2 Correlating Particle-Swarm to PageRank and PageRank-Priors 

To simulate the results of PageRank (global-rank), the decay-scalar 5 was varied be- 
tween 0.005 and 0.995 for every potential dampening factor A between 0.005 and 
0.995. The iterations of the particle-swarm method were constrained to t PS = t PR , 
where t P s and t PP are the amount of iterations for the particle-swarm method 
and PageRank, respectively. Note that when 5 is high, particle death can occur 
before the amount of iterations is complete. For this experiment |P(nfc)| = 10, 
\P\ = 10000. Figure 2a shows that an equal distribution of particles across all of iV 
with j3 — 0.0 simulates the respective PageRank calculation with a near 1.0 Pear- 
son correlation when 5 = A. 




Fig. 2. a. PR vs. PS over 5 and A b. PRP vs. PS over (5 



PageRank-Priors (relative-rank), on the other hand, is a function of two variables, 
the size of the root node set, R, and the back-probability, (3. The root node set 
was determined by randomly assigning a portion of the node population to R, 
R = f(N, if) where the percentage ip G [0.01, 1.0] and \R\ = <p\N\. The selection 
of the root node set had limited effect on the correlation between PageRank-Priors 
and the particle- swarm method. Therefore, to represent the correlations in a 3D 
plot, the <p factor was omitted (Fig. 2b). The iterations of the particle-swarm method 
were constrained to t PS = t PRP where t PRP is the amount of iterations required 
for PageRank-Priors to converge. Furthermore, 5 = 0.0 since PageRank-Priors has 
no dampening-factor parameter. Figure 2b provides the correlation values of the 
particle-swarm's f3 PS G [0.1, 1.0] for all (3 PRP G [0.1, 1.0] of PageRank-Priors. The 
root node set was generated from 10% of the node population, ip = 0.10, therefore 
when \P(r k )\ = 10, \P\ = 1000. PageRank-Priors and the particle-swarm method 
are strongly correlation when -Rprp = R P s and /3prp = /3ps- 

Any variation in the influence vectors between the particle- swarm method and 
PageRank-Priors is due in part to particle death when \out(ci)\ = (Alg. 31-22). 
Since PageRank-Priors models a random walker's home return as a jump to any 
node in R, then all nodes in R have an equal probability of being jumped to (as- 
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suming p(r) = On the other hand, a particle, when returning home, returns 
to its initial destination, c, = hi (Alg. 31-19). If a particular outgoing path from 
an initial node is atrophied, then the potential for \out(ci)\ = is greater and the 
potential for p(r) = ^ is less. Even at (3 = 1.0, particle death is still a possibility. 
The rationale for designing the particle framework in this manner is to ensure de- 
centralization for extended applications of the particle- swarm method. No particle 
has knowledge of R, only its particular initial, or root, node, hi. 



5 Optimizations and Running Time 

This section will extend the current particle-swarm model to express two optimiza- 
tions: iteration constraining and random seeding. Currently, the running time of 
the particle-swarm method is 0(|P| + \P\t) where |P| is the number of particles 
used in the simulation, and t is the number of particle propagation iterations. In 
comparison, the running time of both PageRank and PageRank- Priors is 0(\E\t) 
where E is the set of edges in the network and t is the number of iterations required 
for convergence [13, 14]. It is important to note that \P\ is a function of \N\, not 
\E\, and for most real- world networks \N\ « \E\. An accurate particle-swarm 
simulation of PageRank is possible when \P{n k )\ = 1 and therefore \P\ = 1000. 
For a 7 = 2.5 scale-free network of 1000 nodes \E\ « 2575. Therefore, the Big- 
O speed up, given 20 iterations for each algorithm, is a factor of approximately 

94 r (2575)(20) \E\t 

(1000)+(1000)(20) ' \P\ + \P\f 

Greater gains are seen in the particle- swarms simulation of PageRank-Priors when 
\R\ < \N\. The particle population of a node is a proportion of the total population, 
|P(rjfc)| = j^j. This ratio allows for a smaller particle population to be used when 
simulating PageRank-Priors without degrading the accuracy of the calculation. No- 
tice that |P(r fc )| = ^ P ^| P = where Pprp and P PR are the particle sets for 
PageRank-Priors and PageRank, respectively. For |P| = \R\, the particle- swarm 
algorithms has a running time of 0(|P| + \R\t) when simulating PageRank-Priors. 
The PageRank-Priors particle- swarm simulation is more efficient in terms of run- 
ning time than its originally, and only, published analysis of 0(\E\t) [2]. The ben- 
efits of the particle- swarm simulation of PageRank-Priors are best realized when 
\R\ « \N\ « \E\. 

These calculations assume that the particle-swarm method and PageRank/PageRank- 
Priors both share the same amount of iterations, t PS = t PR , and that the particle- 
swarm method has a homogeneous initial particle seeding of at least 1 particle per 
node. Both of these parameters can be reduced to lower the particle-swarms run- 
ning time with varying effects on the correlation. The following list of variables 
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will be discussed in the following subsections and are presented here for ease of 
reference. 

(1) t PS : number of iterations to propagate particles t PS G N + 

(2) 0: proportion of nodes to receive an initial seeding of particles G [0,1] 

(3) a: number of particles per node in the initial seeding a G N + 

(4) S: the set of nodes receiving particles from the initial seeding S C N and 

\S\ = (j)\N\ 



5. 1 Constraining Particle Iterations and Random Particle Seeding 

Algorithm 31-12 assumes that a particle propagates for the same amount of iter- 
ations as PageRank, t PS = t PR . This is a costly method since, to determine t PR , 
PageRank must be executed. Another way of determining the amount of iterations 
for the particle- swarm method is to wait until all particles have died, which occurs 
when the particle's energy content has decayed to 6j = $ or when q no longer has 
outgoing edges. For a 5 = 0.15 and when q always has at least one outgoing edge, 
particle death occurs after 113 iterations = 10~ 8 ), while the average PageRank 
converges after 22.7 iterations on a 7 = 2.5 scale-free network. This obviously is 
not the fastest method either. Therefore, Figure 3a plots the correlation between 
the particle-swarm method and PageRank as the particle- swarm method's iteration 
value is constrained, t PS G [1, 25]. The range from 25 < t PS < 113 is omitted due 
to insignificant variation in the algorithm's behavior. The result demonstrates that 
the particle-swarm method is strongly correlated with PageRank, C = 0.953, after 
only 4 iterations, t PS = 4. 

Given different gamma values, the amount of iterations should vary. For example, 
a 7 = 2.0 scale-free network with \N\ = 1000 only requires 12.52 iterations for 
PageRank to converge. Similarly, The particle-swarm method requires only 3.01 
iterations to produce a C ~ 0.95. At the other extreme, a 7 = 3.0 scale-free net- 
work requires approximately 28.88 iterations to converge while the particle-swarm 
method requires 6.23 iterations. The general trend, though non-linear, for produc- 
ing a C « 0.95, is t PS « \t PR or for each 7 G [2.0, 3.0], t PS « 27. 

The particle-swarm method can also be optimized by randomly choosing a subset 
of the network to initially seed with particles, S C N . This random subset can 
be expressed as a proportion of the whole network, <p\N\ where G [0, 1] and 
IS* I = 4>\N\. Figure 3b plots the correlation between PageRank and the particle- 
swarm method for different initial particle seed proportions. It is shown that at = 
0.24, when only 24% of the nodes in the network are seeded with a single particle, 
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Fig. 3. a. PS vs. PR over t G [1, 25] b. PS vs. PR over cj> G [0, 1] 

the Pearson correlation is approximately 0.95. Therefore, an accurate PageRank 
calculation does not require all nodes to begin with an equal set of particles. Thus, 

\P\ « \N\. 



5.2 Combining the Optimizations 



The combination of both optimizations is represented in Figure 4 where each ini- 
tial seed proportion, e [0.01, 0.5], is calculated for every iteration amount, t PS G 
[1,25]. Next, Figure 5 plots the iteration amount against the seeding proportion 
for the lowest value pair obtaining a C ~ 0.95. Each plot point's shade value is 
calculated as (fitps, which represents the cost of performing that parameter pair. 
Therefore, to achieve a C ~ 0.95, the most computationally efficient way is to 
use single particles (a = 1), initially distributed over a moderate amount of nodes 
{(f) « 0.45), and propagated over a moderate amount of time steps (t PS ~ 8). 



The speed-up of the particle-swarm method with respects to PageRank is repre- 
sented in Eq. (7) as $. Since 0|iV|a represents the particle population, the full 
running time can still be expressed as 0(|P| + \P\tps). The numerator in Eq. (7) 
is based on the standard PageRank implementation of 0(\E\tpn) [13, 14]. 



= (<l>\N\a) + ((l>\N\a)tp S (7) 



For a 7 = 2.5 scale-free network of |iV| = 1000, the theoretical speed-up of the 
fastest particle- swarm method yielding aC ~ 0.95 (a = 1, (ft — 0.45, t — 8) 

is calculated to be 14.43 = [(0 .45)(iooo)(i)5( ( o 2 4S(iooo)(i)(8)] • To verif y this hypothe- 
sis, PageRank, as implemented in [14], was compared against the most optimized 
particle- swarm method. The benchmark testing was done over 500 trials of 500 
different 7 = 2.5 scale-free networks of |iV| = 1000 with the average speed-up 
factor determined to be 22.23. A potential explanation for the increased benchmark 
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Fig. 5. 4>t (cost) for various iteration/seed proportion pairs at C 0.95 
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speed-up relative to the theoretical speed-up may be in part to the fact that over 
the course of the particle-swarm algorithm, particles die before all iterations are 
complete (Alg. 31-15,17,23). Therefore, the general rule is that as t increases, \P\ 
decreases. 



6 Conclusion 

Due to the popularity of the the global-rank implementation of PageRank there ex- 
ists much literature on efficient implementations of the algorithm. One particular 
example includes an algorithm that partitions the graph into related influence clus- 
ters [15]. The graph clustering method groups nodes of a similar PageRank into a 
hyper-node and then calculates the full converging PageRank vector on the newly 
constructed hyper-network. In this way, the clustering method is able to reduce the 
total amount of edges, E, iterated over. The publication states that the typical edge 
reduction between the original network and the hyper-network is a factor of 20 for 
networks containing billions of edges. The paper states a Spearman correlation of 
0.95 and a 2 fold increase in calculation time relative to a 'highly optimized' im- 
plementation of PageRank. Edge reduction, by way of node grouping, also reduces 
the amount of nodes in the networks. Therefore, there is a strong incentive to com- 
bine the graph clustering method and the particle- swarm method. This has not been 
tested as of yet. 

Finally, the space constraints of the particle-swarm method are larger than tradi- 
tional matrix methods since these methods do not represent particles, only the in- 
fluence vector, I, and the adjacency matrix of the graph. This representation lends 
itself towards efficient space modifications [16]. The particle-swarm implementa- 
tion discussed in this paper is calculated solely in main memory for small networks 
less than 10,000 nodes. This test-bed implementation is obviously not useful for 
calculations on web-sized networks. Future work will describe a system architec- 
ture for performing particle- swarm algorithms on large-scale networks. 

The particle- swarm method for graph analysis has an appeal in its potential for 
functional modification. From the object-oriented perspective, a particle can be 
seen as an 'agent' that can contain any number of properties and behaviors. The 
potential for modifying the particle-swarm framework presented in this paper can 
lead to a host of augmentations to the mentioned influence metrics. One example 
includes the incorporation of 'negative' energy particles to reduce specific node 
influence as explained in [5]. New particle- swarm metrics are currently being im- 
plemented and results will be presented in future publications. This paper's simula- 
tions were performed using the Confluence package [17]. Our Confluence API has 
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been written such that new particles can be easily extended to the basic 'energy' 
particle framework. 
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